7 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    Lightweight programmable DSP block overlay for streaming neural network acceleration

    Get PDF
    Implementations of hardware accelerators for neu- ral networks are increasingly popular on FPGAs, due to flex- ibility, achievable performance and efficiency gains resulting from network optimisations. The long compilation time required by the backend toolflow, however, makes rapid deployment and prototyping of such accelerators on FPGAs more difficult. Moreover, achieving high frequency of operation requires sig- nificant low-level design effort. We present a neural network overlay for FPGAs that exploits DSP blocks, operating at near their theoretical maximum frequency, while minimizing resource utilization. The proposed architecture is flexible, enabling rapid runtime configuration of network parameters according to the desired network topology. It is tailored for lightweight edge implementations requiring acceleration, rather than the highest throughput achieved by more complex architectures in the datacenter

    Network intrusion detection using neural networks on FPGA SoCs

    Get PDF
    Network security is increasing in importance as systems become more interconnected. Much research has been conducted on large appliances for network security, but these do not scale well to lightweight systems such as those used in the Internet of Things (IoT). Meanwhile, the low power processors used in IoT devices do not have the required performance for detailed packet analysis. We present an approach for network intrusion detection using neural networks, implemented on FPGA SoC devices that can achieve the required performance on embedded devices. The design is flexible, allowing model updates in order to adapt to emerging attacks

    Exploring the capabilities of FPGA DSP blocks in neural network accelerators

    No full text
    Neural networks have contributed significantly in applications that had been difficult to implement with the traditional programming concepts (e.g. computer vision, natural language processing). In many occasions, they outperform their hand coded counterparts and are increasingly popular in end user applications. Neural networks, however, are compute and memory demanding, making their execution in resource constraint devices more difficult, especially for real time applications. Custom computing architectures on Field-Programmable Gate Arrays (FPGAs) have traditionally been used to accelerate such computations to meet specific requirements. Nonetheless, most approaches in the literature do not consider in detail the underlying FPGA architecture, resulting in less efficient implementations. They additionally have focused on complex designs optimised for high throughput in a datacenter setting with access to large datasets in memory. Meanwhile real edge applications are often processing streaming sensor data and require consideration of efficiency. Detailed FPGA implementations involve time consuming low level design effort, which in turn result in long turnaround time. FPGAs have evolved over the years to include hard macro blocks, for example Digital Signal Processing (DSP) blocks, that map more efficiently widely used operations. In addition, FPGAs are often tightly coupled with embedded microprocessors in a System-on-Chip (SoC) arrangement that offers a complete system solution. This thesis explores the capabilities of FPGA DSP blocks in neural network accelerators. Within this context, practices and tools that improve turnaround time have been explored, drawing conclusions on how to exploit DSP blocks in a way that maximises performance and efficiency. Finally, the work in this thesis shows that designing overlays in an architecture-centric manner can result in high operating frequency, while scaling to better utilise FPGA resources

    High performance pipelined FPGA implementation of the SHA-3 hash algorithm

    No full text
    The SHA-3 cryptographic hash algorithm is standardized in FIPS 202. We present a pipelined hardware architecture supporting all the four SHA-3 modes of operation and a high-performance implementation for FPGA devices that can support both multi-block and multi-message processing. Experimental results on different FPGA devices validate that the proposed design achieves significant throughput improvements compared to the available literature.EUROMICRO,Faculty of Electrical Engineering of the University of Montenegro,IEEE,MANT,Ryazan State Radio Engineering Universit

    Pipelined SHA-3 implementations on FPGA: Architecture and performance analysis

    No full text
    .Efficient and high-throughput designs of hash functions will be in great demand in the next few years, given that every IPv6 data packet is expected to be handled with some kind of security features. In this paper, pipelined implementations of the new SHA- 3 hash standard on FPGAs are presented and compared aiming to map the design space and the choice of the number of pipeline stages. The proposed designs support all the four SHA-3 modes of operation. They also support processing of multiple messages each comprising multiple blocks. Designs for up to a four-stage pipeline are presented for three generations of FPGAs and the performance of the implementations is analyzed and compared in terms of the throughput/area metric. Several pipeline designs are explored in order to determine the one that achieves the best throughput/area performance. The results indicate that the FPGA technology characteristics must also be considered when choosing an efficient pipeline depth. Our designs perform better compared to the existing literature due to the extended optimization effort on the synthesis tool and the efficient design of multiblock message processing
    corecore